Regex for replacing missing value period

It looks like all your floats are "0.#" format. Why not try to replace just the single value ".,". This way the search is looking the actual value that needs replacing. And although my python isn't perfect you get the idea with...

Expand|Select|Wrap|Line Numbers

 # line='2.5,3,.,100,.,4.10,.,8,.,.,.,8.9,.'

# line.replace('.,','0.0,')

# '2.5,3,0.0,100,0.0,4.10,0.0,8,0.0,0.0,0.0,8.9,0.0'

This way you don't loose the field and mess the database up completely. If you don't need the "0.0" you can write it in to the code that it doesn't display.

Dec 20 '10 #7

akselo

The option of searching for the literal is at one level compelling

Expand|Select|Wrap|Line Numbers

 
# line='2.5,3,.,100,.,4.10,.,8,.,.,.,8.9,.'

# line.replace('.,','0.0,')

# '2.5,3,0.0,100,0.0,4.10,0.0,8,0.0,0.0,0.0,8.9,0.0'

But this doesn't work if the last item is a period and not a number as the replacement looks for a trailing comma. This could of course be handled separately, but I was hoping for a both general and clean solution to this issue.

Also, what is the "best" way of dealing with the carriage return in the last item of each line when iterating over lines in a file? Replacing \n with blank space?

Dec 20 '10 #8

Thekid

145

100+

I was going to suggest just replacing the 'period comma' with a blank space which works except if the last value is a period:

Expand|Select|Wrap|Line Numbers

 
# line='2.5,3,.,100,.,4.10,.,8,.,.,.,8.9,.'

# line.replace('.,','')

# '2.5,3,100,4.10,8,8.9,.'

Dec 20 '10 #9

You could switch the comma placement from "0.0," to ",0.0".

Expand|Select|Wrap|Line Numbers

# line.replace(',.',',0.0')

And generally the new line at the end of the is to keep the format of the table or database.

If I split a database in to fields. Every field needs to have some value thus the ".".

AA|AB|AC|AD
BA|BB|BC|BD
CA|CB|CC|CD
DA|DB|DC|DD

If I have a 4 field database and I kill the \n it becomes a run-on and then the parsing engine looses everything beyond it's initial 4 fields.

AA|AB|AC|AD|BA|BB|BC|BD|CA|CB|CC|CD|DA|DB|DC|DD

to the engine looks like AA|AB|AC|AD

Now If you just want a longlist of numbers and dont care what the numbers mean you can in most languages just replace the "\n".

Expand|Select|Wrap|Line Numbers

 
# # line='2.5,3,.,100,.,4.10,.,8,.,.,.,8.9,.\n2.5,3,.,100,.,4.10,.,8,.,.,.,8.9,.'

# # line.replace(',.',',0.0')

# # line.replace('\n',',')

# # '2.5,3,0.0,100,0.0,4.10,0.0,8,0.0,0.0,0.0,8.9,0.0,2.5,3,0.0,100,0.0,4.10,0.0,8,0.0,0.0,0.0,8.9,0.0'

Dec 20 '10 #10

Thekid

145

100+

I think another issue will be if a value is a fraction like:
line='.,.45,.25,.,100,7.5,.'

Dec 20 '10 #11

akselo

Last column values are often 'period'. How about a pattern like

Expand|Select|Wrap|Line Numbers

 
#line='ACSSF,2009e5,ca,000,0100,0013344,309,11,0,0,0,0,11,0,10,0,0,0,0,10,0,93,0,0,0,38,55,0,54,0,0,0,12,42,0,43,12,0,8,21,2,0,20,11,0,0,9,0,0,78,33,39,6,0,0,0,593,0,0,0,0,0,0,0,0,17,0,0,0,0,0,11,0,0,0,0,17,10,45,308,185,789200,909500,1000001,.,.,.,.,.,.,.,.,.,.,.,.,.,593,474,159,49,110,0,315,119,541772500,438522500,103250000,.,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,.,593,474,0,0,0,0,0,0,0,0,0,17,20,22,62,69,284,119,0,0,0,0,0,32,0,0,19,30,38,2909,4001,628,2035900,1966500,69300,593,474,49,48,91,81,51,49,8,39,58,0,119,77,42,0,0,0,0,0,0,0,0,19.4,23.0,7.7,593,0,0,0,0,0,0,0,41,0,0,0,10,31,0,358,144,81,44,31,58,0,194,163,0,7,8,16,0,593,0,0,0,32,0,19,30,21,17,0,37,22,415\n'

#patt = re.compile(r'\D\.\D')

#'ACSSF,2009e5,ca,000,0100,0013344,309,11,0,0,0,0,11,0,10,0,0,0,0,10,0,93,0,0,0,38,55,0,54,0,0,0,12,42,0,43,12,0,8,21,2,0,20,11,0,0,9,0,0,78,33,39,6,0,0,0,593,0,0,0,0,0,0,0,0,17,0,0,0,0,0,11,0,0,0,0,17,10,45,308,185,789200,909500,1000001,0.0,.,0.0,.,0.0,.,0.0,.,0.0,.,0.0,.,0.0,593,474,159,49,110,0,315,119,541772500,438522500,103250000,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,593,474,0,0,0,0,0,0,0,0,0,17,20,22,62,69,284,119,0,0,0,0,0,32,0,0,19,30,38,2909,4001,628,2035900,1966500,69300,593,474,49,48,91,81,51,49,8,39,58,0,119,77,42,0,0,0,0,0,0,0,0,19.4,23.0,7.7,593,0,0,0,0,0,0,0,41,0,0,0,10,31,0,358,144,81,44,31,58,0,194,163,0,7,8,16,0,593,0,0,0,32,0,19,30,21,17,0,37,22,415\n'

It doesn't give me the desired result, but conceptually straightforward: return a match whereever a period exists if surrounded by non-digits. Repeat periods cause grief here.

Dec 20 '10 #12

From the code that you were showing at the beginning all your fractions were in were in a "0.#" format. But if that is an issue try...

Expand|Select|Wrap|Line Numbers

 line.replace(',.',',0.')

line.replace('0.,',',0.0,')

line.replace('\n',',')

( On a side note I think that the regex for the find would be ",\.," or ",[\.],". And to keep out the numbers and letters, ",(*[^0-9a-zA-Z\s][^\.]*[^0-9a-zA-Z\s])," but my regex is not perfect.)

Dec 20 '10 #13

Thekid

145

100+

I don't have python on this particular computer but how about:

Expand|Select|Wrap|Line Numbers

 
line='ACSSF,2009e5,ca,000,0100,0013344,309,11,0,0,0,0,11,0,10,0,0,0,0,10,0,93,0,0,0,38,55,0,54,0,0,0,12,42,0,43,12,0,8,21,2,0,20,11,0,0,9,0,0,78,33,39,6,0,0,0,593,0,0,0,0,0,0,0,0,17,0,0,0,0,0,11,0,0,0,0,17,10,45,308,185,789200,909500,1000001,.,.,.,.,.,.,.,.,.,.,.,.,.,593,474,159,49,110,0,315,119,541772500,438522500,103250000,.,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,.,593,474,0,0,0,0,0,0,0,0,0,17,20,22,62,69,284,119,0,0,0,0,0,32,0,0,19,30,38,2909,4001,628,2035900,1966500,69300,593,474,49,48,91,81,51,49,8,39,58,0,119,77,42,0,0,0,0,0,0,0,0,19.4,23.0,7.7,593,0,0,0,0,0,0,0,41,0,0,0,10,31,0,358,144,81,44,31,58,0,194,163,0,7,8,16,0,593,0,0,0,32,0,19,30,21,17,0,37,22,415\n' 
 
newline1=line.replace('.,','')

newline2=newline1.replace(',.\n','') #incase last value is a period

...or you could make your replacement patterns contain '.,',,.\n','\n'

Dec 20 '10 #14

bvdet

2,851

Expert Mod 2GB

This is a CSV file, right? All data is separated by a comma or other known delimiter. This is the easiest way:

Expand|Select|Wrap|Line Numbers

 line = '.,ACSSF,2009e5,ca,000,0100,0013344,309,11,0,0,0,0,11,0,10,0,0,0,0,10,0,93,0,0,0,38,55,0,54,0,0,0,12,42,0,43,12,0,8,21,2,0,20,11,0,0,9,0,0,78,33,39,6,0,0,0,593,0,0,0,0,0,0,0,0,17,0,0,0,0,0,11,0,0,0,0,17,10,45,308,185,789200,909500,1000001,.,.,.,.,.,.,.,.,.,.,.,.,.,593,474,159,49,110,0,315,119,541772500,438522500,103250000,.,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,.,593,474,0,0,0,0,0,0,0,0,0,17,20,22,62,69,284,119,0,0,0,0,0,32,0,0,19,30,38,2909,4001,628,2035900,1966500,69300,593,474,49,48,91,81,51,49,8,39,58,0,119,77,42,0,0,0,0,0,0,0,0,19.4,23.0,7.7,593,0,0,0,0,0,0,0,41,0,0,0,10,31,0,358,144,81,44,31,58,0,194,163,0,7,8,16,0,593,0,0,0,32,0,19,30,21,17,0,37,22,415,.\n'

lineList = line.split(',')

output = []

for item in lineList:

    if item.strip() == ".":

        output.append("***")

    else:

        output.append(item)

print ",".join(output)

Output:

Expand|Select|Wrap|Line Numbers

 >>> ***,ACSSF,2009e5,ca,000,0100,0013344,309,11,0,0,0,0,11,0,10,0,0,0,0,10,0,93,0,0,0,38,55,0,54,0,0,0,12,42,0,43,12,0,8,21,2,0,20,11,0,0,9,0,0,78,33,39,6,0,0,0,593,0,0,0,0,0,0,0,0,17,0,0,0,0,0,11,0,0,0,0,17,10,45,308,185,789200,909500,1000001,***,***,***,***,***,***,***,***,***,***,***,***,***,593,474,159,49,110,0,315,119,541772500,438522500,103250000,***,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,***,593,474,0,0,0,0,0,0,0,0,0,17,20,22,62,69,284,119,0,0,0,0,0,32,0,0,19,30,38,2909,4001,628,2035900,1966500,69300,593,474,49,48,91,81,51,49,8,39,58,0,119,77,42,0,0,0,0,0,0,0,0,19.4,23.0,7.7,593,0,0,0,0,0,0,0,41,0,0,0,10,31,0,358,144,81,44,31,58,0,194,163,0,7,8,16,0,593,0,0,0,32,0,19,30,21,17,0,37,22,415,***

>>>

Dec 20 '10 #15

Expand|Select|Wrap|Line Numbers

 # newline1=line.replace('.,','')

# newline2=newline1.replace(',.\n','') #incase last value is a period

This code works. The only problem is a float that is an int... e.g in C# and C++ you values like 100f == (float)100 == 100.0, which could be in your db as ",100.,".

Your second replace deletes the field and mashes the rows together screwing the db. Try instead....

Expand|Select|Wrap|Line Numbers

 # newline1=line.replace('.,','')

# newline2=newline1.replace(',.\n',',0.0\n') #incase last value is a period

(I don't know about python but in c, c++ and c# you can chain code together like... "temp=line.replace('.,','').replace(',.\n',',0.0\n ')". To me its easier to think this way.)

Dec 20 '10 #16

akselo

Thanks all for your help. I must say that it makes sense to just use the built-in split-to-list feature of strings and check each element. It seems pretty safe. However, it takes a long time to process a large dataset and check each value to see if it is a period, in this case the census numbers for the state of California. But efficiency doesn't matter that much in this case as releases are rare.

Dec 21 '10 #17

dwblas

626

Expert 512MB

it takes a long time to process a large dataset and check each value to see if it is a period

Only do that if it is necessary, i.e. if you are summing one field of the list for example, check for a "." and ignore (since it is zero). When you are already iterating through the list it takes an insignificant amount of time since it is one additional if() statement. You do not have to change every field, just check the ones you are using.

Dec 22 '10 #18

Similar topics

Using Missing.Value in C#

by: DN | last post by:

Is it possible to use Missing.Value in C#, or is that just a VB concept? I'm using Persits ASPEmail component in an ASP.NET page, and their sample code shows something like this: <%@ Page...

regex: replacing one group of a match

by: darrel | last post by:

I'm new to .net's regex capabilities. I've made a regex: dim r as new regex( _ "(?<tagOpening>(<))" & _ "(?<theTag>(" & tagToFind & "))" & _ "(?<anything>(.*))" & _ "(?<theAttribute>(" &...

Replacing table value with different value in VB

by: amraam35 | last post by:

Here is the problem. I have made a log in script that prompts you to log in with a username and password as the "splash screen" It is an unbounded forum that checks that the username and password...

Microsoft Access / VBA

what is the best way to do this (replacing xmlnode value)

by: dotnetnoob | last post by:

i need to copy several xml files and use it as xml element and insert them into a xml file. i use DOM support in .Net and from the pervious help i was able to copy and insert them into the xml file...

.NET Framework

Replacing NULL value in multiple columns in a table

by: BF | last post by:

Hi, I have some tables where I import data in, lots of field have gotten a NULL value which the application can not handle. Now can I replace each NULL value with '' in a columns with: update...

Microsoft SQL Server

Regex: replacing \n and spaces

by: Trev | last post by:

Hi everyone, I'm having some problems using Regex; I have a long string that is delimetered in a random fashion by a combination of spaces and \n's for newlines. I have five possiblities: //...

C# / C Sharp

Replacing the Value of a particular Dropdownlist Item

by: Jason | last post by:

The following code removes a particular item from my drop down. Instead, I would like to replace it's value with another string. How can I do that in vb.net? Dim ddl =...

Query from two tables that looks for a missing value

by: silversubey | last post by:

Hello, I have two tables: CltDue, Checklist SELECT CltDue.CDClientname, CltDue.CDID FROM CltDue WHERE CLtdue.CDstatus = 'More Client info reqst' shows the following:

Microsoft SQL Server

Property missing value

by: shapper | last post by:

Hello, I have a control with the following property: Private _Sections As New List(Of FormSection) Public Property Sections() As List(Of FormSection) Get If _Sections Is Nothing Then...

missing value in datagridview

by: isisqiero | last post by:

I use a list of names in a Word document as the value of SELECT statement's parameter. However, my datagridview shows only the column names without the data I am hoping to see. Public Class Form1 ...

.NET Framework

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel